Skip to content

GH-126910: Make _Py_get_machine_stack_pointer return the actual stack pointer#149103

Open
markshannon wants to merge 4 commits intopython:mainfrom
markshannon:use-real-stack-pointer
Open

GH-126910: Make _Py_get_machine_stack_pointer return the actual stack pointer#149103
markshannon wants to merge 4 commits intopython:mainfrom
markshannon:use-real-stack-pointer

Conversation

@markshannon
Copy link
Copy Markdown
Member

@markshannon markshannon commented Apr 28, 2026

(or something close it), but not the frame pointer

  • Make _Py_ReachedRecursionLimit inline again
  • Remove _Py_MakeRecCheck replacing its use with _Py_ReachedRecursionLimit
  • Move the check for C stack swtiching into _Py_CheckRecursiveCall

This is a rebase of #147945 which was reverted due to a non-reproducable buildbot failure.

…ething close it), but not the frame pointer

* Make _Py_ReachedRecursionLimit inline again
* Remove _Py_MakeRecCheck replacing its use with _Py_ReachedRecursionLimit
* Move the check for C stack swtiching into _Py_CheckRecursiveCall
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Apr 29, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32465483 | 📁 Comparing 01fe604 against main (d4eee16)

  🔍 Preview build  

6 files changed · ± 6 modified

± Modified

@markshannon markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 29, 2026
@bedevere-bot
Copy link
Copy Markdown

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 01fe604 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149103%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 29, 2026
@markshannon markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 6, 2026
@bedevere-bot
Copy link
Copy Markdown

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 98073f5 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149103%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 6, 2026
@markshannon
Copy link
Copy Markdown
Member Author

@pablogsal
Can you tell what is going on with the buildbot/AMD64 CentOS9 NoGIL PR buildbot?
The test_python_calls_appear_in_the_stack_if_perf_activated test is failing.
Interstingly py_trampoline_evaluator appears 1000s of times in the test output.
Isn't that supposed to show as the Python function in perf?

@pablogsal
Copy link
Copy Markdown
Member

@pablogsal

Can you tell what is going on with the buildbot/AMD64 CentOS9 NoGIL PR buildbot?

The test_python_calls_appear_in_the_stack_if_perf_activated test is failing.

Interstingly py_trampoline_evaluator appears 1000s of times in the test output.

Isn't that supposed to show as the Python function in perf?

Perf should add the Python function below that one, yes. I need time to investigate I will try to do it this week but I am a bit overwhelmed with pending things. I will try to take a look as soon as possible.

@pablogsal
Copy link
Copy Markdown
Member

pablogsal commented May 7, 2026

@pablogsal
Can you tell what is going on with the buildbot/AMD64 CentOS9 NoGIL PR buildbot?
The test_python_calls_appear_in_the_stack_if_perf_activated test is failing.
Interstingly py_trampoline_evaluator appears 1000s of times in the test output.
Isn't that supposed to show as the Python function in perf?

I spent several hours digging into this one and was able to reproduce theCentOS9 NoGIL failure locally in a CentOS Stream 9 podman container with GCC 11.5 and perf 5.14.

The short version is: the PR itself did not break perf-map generation. The generated perf map still contains the py::foo, py::bar, and py::baz entries. What broke is perf's ability to unwind into those generated trampoline
frames on this builder.

On the failing build, perf script showed lots of py_trampoline_evaluator frames, but none of the generated py::foo/bar/baz symbols. That means the trampoline machinery was active, but perf could not walk the frame-pointer chain far enough to resolve the generated Python frames. The regression comes from the change in _Py_get_machine_stack_pointer(). Before this PR, it used:

__builtin_frame_address(0)

After this PR, on x86-64 it reads the real stack pointer with inline asm:

__asm__("{movq %%rsp, %0" : "=r" (result));

The new behavior is the right semantic direction for the stack-pointer work, but the old __builtin_frame_address(0) had an accidental side effect: it forced GCC to materialize an rbp frame pointer in _PyEval_EvalFrameDefault().

That matters because the perf trampoline test depends on frame-pointer unwinding.

On the failing CentOS9/GCC 11.5 build, _PyEval_EvalFrameDefault() also has this function-specific optimization attribute:

__attribute__((optimize ("no-tree-slp-vectorize")))

With the new %rsp helper, GCC generated _PyEval_EvalFrameDefault() without the normal frame-pointer prologue:

_PyEval_EvalFrameDefault:
    push   %r15
    push   %r14
    push   %r13
    push   %r12
    push   %rbx
    sub    $0x180,%rsp
    ...
    mov    %rsp,%rax

With the old __builtin_frame_address(0) helper, the same build generated:

_PyEval_EvalFrameDefault:
    push   %rbp
    mov    %rsp,%rbp
    push   %r15
    ...

So before this PR, the test passed because __builtin_frame_address(0) was preserving the frame-pointer chain. After this PR, that accidental protection disappeared, and perf's frame-pointer unwinder could no longer walk through _PyEval_EvalFrameDefault() into the generated trampoline frame.

The fix is to keep the new real stack-pointer behavior, but make the eval-loop optimization attribute explicitly preserve frame pointers too:

#define DONT_SLP_VECTORIZE \
    __attribute__((optimize ("no-tree-slp-vectorize", "no-omit-frame-pointer")))

I verified this in the CentOS Stream 9 reproduction:

  1. New %rsp helper plus the old DONT_SLP_VECTORIZE: reproduces the buildbot failure.
  2. Reverting only _Py_get_machine_stack_pointer() to__builtin_frame_address(0): test passes.
  3. New %rsp helper plus optimize("no-tree-slp-vectorize", "no-omit-frame-pointer"): test passes.
  4. Disassembly after the fix again shows _PyEval_EvalFrameDefault() starting with:
push   %rbp
mov    %rsp,%rbp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants